Skip to content

Conversation

@DataCorrupted
Copy link
Member

@DataCorrupted DataCorrupted commented Aug 21, 2025

A "Sequence" in the line table consists of multiple machine instructions [LowPC, HighPC) and rows [FirstRowIndex, LastRowIndex).

bool isValid() const {
return !Empty && (LowPC < HighPC) && (FirstRowIndex < LastRowIndex);
}

When adding a Sequence, LastRow is RowNumbe + 1 while HighPC is Row.Address:

if (Row.EndSequence) {
// Record the end of instruction sequence.
Sequence.HighPC = Row.Address.Address;
Sequence.LastRowIndex = RowNumber + 1;
Sequence.SectionIndex = Row.Address.SectionIndex;
if (Sequence.isValid())
LineTable->appendSequence(Sequence);
Sequence.reset();
}

This means that if you look up LineTable's last row (LT->Rows.back()), you get nothing because its address is not considered to be part of the "Sequence".

Another problem is one-instruction functions can't be its own "Sequence" since LowPC == HighPC, the parser will think this is an invalid Sequence. Normally this is not a problem, but with ICF=safe-thunk, many one-instruction thunks are added. These functions' DebugLine can't be parsed as a stand alone "Sequence", which had made relocation (#149618) and verification (#152807) substantially harder.

I can think of two solutions:

  1. Define Sequence's PC to be [LowPC, HighPC + MinInstLength) shown in this diff
    • Unfortunately when I try this one out a bunch of tests in llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp, will fail.
  2. During ICF=safe-thunk, the thunk function can have two instructions (a nop before branch shown below) [ICF] Add a NOP after branch in ICF thunk to improve debugability #154986

static constexpr uint32_t icfSafeThunkCode[] = {
0x14000000, // 08: b target
};

@DataCorrupted DataCorrupted changed the title [DebugLine] Sequence HighPC should be LastRow.address + MinInstLength [NFC][DebugInfo] Sequence HighPC should be LastRow.address + MinInstLength Aug 21, 2025
@github-actions
Copy link

github-actions bot commented Aug 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@DataCorrupted
Copy link
Member Author

cc @nocchijiang

@DataCorrupted DataCorrupted changed the title [NFC][DebugInfo] Sequence HighPC should be LastRow.address + MinInstLength [NFC][DebugInfo] Allow single instruction sequence in line table to allow easier symbolication of merged functions Aug 21, 2025
@DataCorrupted DataCorrupted changed the title [NFC][DebugInfo] Allow single instruction sequence in line table to allow easier symbolication of merged functions [NFC][DebugInfo] Allow single instruction sequence in line table to make symbolication of merged functions easier Aug 21, 2025
Signed-off-by: Peter Rong <[email protected]>
@nocchijiang
Copy link
Contributor

Thank you for your work on this. When I was trying to create the minimal repro in the RFC thread, I had a chance to look at the code rebuilding sequences and realized that some of the existing design/implementation are somewhat incompatible with the goals for DW_AT_LLVM_stmt_sequence. For instance, the practice of merging neighboring sequences seems to be problematic. I see from #149618 you are trying to rebuild the sequences (again) just for DW_AT_LLVM_stmt_sequence, and I was wondering about the long-term direction. While this approach has minimal impact on other users, I am not entirely sure if it really is the "right" direction. As for this patch, I think the approach is good, but I cannot say if it would break things for others - I am really not an expert of DWARF-related matters.

Additionally, I believe that it is not a great idea to insert nops before thunks. Compilers sometimes generates functions consisting of a single branch. At ByteDance, we also teach our linker to emit thunks similar to LLD's -icf=safe_thunks, and we'd want to avoid making changes just to accommodate DWARF requirements.

Signed-off-by: Peter Rong <[email protected]>
@DataCorrupted
Copy link
Member Author

Compilers sometimes generates functions consisting of a single branch

I thought about this, and if we are to emit a nop after branch, we can assume all merged functions that requires DW_AT_LLVM_stmt_sequence has at least two instructions. If so, we can skip emitting DW_AT_LLVM_stmt_sequence for single-instruction functions, as such they will be symbolicated just fine, as if DW_AT_LLVM_stmt_sequence was never introduced. I've put up another draft for this: #154986

I think the approach is good, but I cannot say if it would break things for others

Unfortunately, it is, breaking somethings in LLDB/GSYM. I don't think I can move forward until I can receive confirmation from more experienced eyes: Are these breaks are benign and can be easily fixed, or did it break some fundamental assumptions in your work?

  • Some line tables that were considered illegal is now legal: lld :: ELF/undef.s @jh7370
  • Some tests are relying on the fact that the last row can't be found:
    • LLVM :: tools/llvm-dwarfdump/X86/locstats.ll
    • LLVM :: tools/llvm-symbolizer/symbol-search.test
    • LLVM-Unit :: DebugInfo/GSYM/./DebugInfoGSYMTests/GSYMTest/TestDWARFNoLines @clayborg

@DataCorrupted
Copy link
Member Author

Closed since #157529 fixed the root cause

DataCorrupted added a commit that referenced this pull request Sep 12, 2025
### Context

#99710 introduced `.loc_label` so we can terminate a line sequence.
However, it did not advance PC properly. This is problematic for
1-instruction functions as it will have zero-length sequence. The test
checked in that PR shows the problem:

```
# CHECK-LINE-TABLE:                  Address            Line   Column File   ISA Discriminator OpIndex Flags
# CHECK-LINE-TABLE-NEXT:             ------------------ ------ ------ ------ --- ------------- ------- -------------
# CHECK-LINE-TABLE-NEXT: 0x00000028: 05 DW_LNS_set_column (1)
# CHECK-LINE-TABLE-NEXT: 0x0000002a: 00 DW_LNE_set_address (0x0000000000000000)
# CHECK-LINE-TABLE-NEXT: 0x00000035: 01 DW_LNS_copy
# CHECK-LINE-TABLE-NEXT:             0x0000000000000000      1      1      1   0             0       0  is_stmt
# CHECK-LINE-TABLE-NEXT: 0x00000036: 00 DW_LNE_end_sequence
# CHECK-LINE-TABLE-NEXT:             0x0000000000000000      1      1      1   0             0       0  is_stmt end_sequence
```

Both rows having PC 0x0 is incorrect, and parsers won't be able to parse
them. See more explanation why this is wrong in #154851.

### Design

This PR attempts to fix this by advancing the PC to the next available
Label, and advance to the end of the section if no Label is available.

### Implementation

- `emitDwarfLineEndEntry` will advance PC to the `CurrLabel`
- If `CurrLabel` is null, its probably a fake LineEntry we introduced in
#110192. In that case look for the next Label
- If still not label can be found, use `null` and
`emitDwarfLineEndEntry` is smart enough to advance PC to the end of the
section
- Rename `LastLabel` to `PrevLabel`, "last" can mean "previous" or
"final", this is ambigous.
- Updated the tests to emit a correct label.

### Note

This fix should render #154986 and #154851 obsolete, they were temporary
fixes and don't resolve the root cause.

---------

Signed-off-by: Peter Rong <[email protected]>
dwblaikie pushed a commit that referenced this pull request Sep 16, 2025
This reverts commit aabf18d.
#157529 included a test that used clang, which doesn't exists in some
CI. #158343 reverts it.

This PR reapplies the original patch with the incorrect test removed.

### Context

#99710 introduced `.loc_label` so we can terminate a line sequence.
However, it did not advance PC properly. This is problematic for
1-instruction functions as it will have zero-length sequence. The test
checked in that PR shows the problem:

```
# CHECK-LINE-TABLE:                  Address            Line   Column File   ISA Discriminator OpIndex Flags
# CHECK-LINE-TABLE-NEXT:             ------------------ ------ ------ ------ --- ------------- ------- -------------
# CHECK-LINE-TABLE-NEXT: 0x00000028: 05 DW_LNS_set_column (1)
# CHECK-LINE-TABLE-NEXT: 0x0000002a: 00 DW_LNE_set_address (0x0000000000000000)
# CHECK-LINE-TABLE-NEXT: 0x00000035: 01 DW_LNS_copy
# CHECK-LINE-TABLE-NEXT:             0x0000000000000000      1      1      1   0             0       0  is_stmt
# CHECK-LINE-TABLE-NEXT: 0x00000036: 00 DW_LNE_end_sequence
# CHECK-LINE-TABLE-NEXT:             0x0000000000000000      1      1      1   0             0       0  is_stmt end_sequence
```

Both rows having PC 0x0 is incorrect, and parsers won't be able to parse
them. See more explanation why this is wrong in #154851.

### Design

This PR attempts to fix this by advancing the PC to the next available
Label, and advance to the end of the section if no Label is available.

### Implementation

- `emitDwarfLineEndEntry` will advance PC to the `CurrLabel`
- If `CurrLabel` is null, its probably a fake LineEntry we introduced in
#110192. In that case look for the next Label
- If still not label can be found, use `null` and
`emitDwarfLineEndEntry` is smart enough to advance PC to the end of the
section
- Rename `LastLabel` to `PrevLabel`, "last" can mean "previous" or
"final", this is ambigous.
- Updated the tests to emit a correct label.

### Note

This fix should render #154986 and #154851 obsolete, they were temporary
fixes and don't resolve the root cause.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants